Biologically Plausible Speech Recognition with LSTM Neural Nets

نویسندگان

Alex Graves

Douglas Eck

Nicole Beringer

Jürgen Schmidhuber

چکیده

Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) are local in space and time and closely related to a biological model of memory in the prefrontal cortex. Not only are they more biologically plausible than previous artificial RNNs, they also outperformed them on many artificially generated sequential processing tasks. This encouraged us to apply LSTM to more realistic problems, such as the recognition of spoken digits. Without any modification of the underlying algorithm, we achieved results comparable to state-of-the-art Hidden Markov Model (HMM) based recognisers on both the TIDIGITS and TI46 speech corpora. We conclude that LSTM should be further investigated as a biologically plausible basis for a bottom-up, neural netbased approach to speech recognition.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifying Unprompted Speech by Retraining LSTM Nets

We apply Long Short-Term Memory (LSTM) recurrent neural networks to a large corpus of unprompted speechthe German part of the VERBMOBIL corpus. Training first on a fraction of the data, then retraining on another fraction, both reduces time costs and significantly improves recognition rates. For comparison we show recognition rates of Hidden Markov Models (HMMs) on the same corpus, and provide ...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Long short-term memory based convolutional recurrent neural networks for large vocabulary speech recognition

Long short-term memory (LSTM) recurrent neural networks (RNNs) have been shown to give state-of-the-art performance on many speech recognition tasks, as they are able to provide the learned dynamically changing contextual window of all sequence history. On the other hand, the convolutional neural networks (CNNs) have brought significant improvements to deep feed-forward neural networks (FFNNs),...

متن کامل

Biologically inspired emotion recognition from speech

Emotion recognition has become a fundamental task in human-computer interaction systems. In this article, we propose an emotion recognition approach based on biologically inspired methods. Specifically, emotion classification is performed using a long short-term memory (LSTM) recurrent neural network which is able to recognize long-range dependencies between successive temporal patterns. We pro...

متن کامل

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Recurrent neural networks (RNNs), especially long shortterm memory (LSTM) RNNs, are effective network for sequential task like speech recognition. Deeper LSTM models perform well on large vocabulary continuous speech recognition, because of their impressive learning ability. However, it is more difficult to train a deeper network. We introduce a training framework with layer-wise training and e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Biologically Plausible Speech Recognition with LSTM Neural Nets

نویسندگان

چکیده

منابع مشابه

Classifying Unprompted Speech by Retraining LSTM Nets

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Long short-term memory based convolutional recurrent neural networks for large vocabulary speech recognition

Biologically inspired emotion recognition from speech

Deep LSTM for Large Vocabulary Continuous Speech Recognition

عنوان ژورنال:

اشتراک گذاری